Warning Concerning Copyright Restrictions 

The Copyright law of the United States (Title 17, United States Code) governs the 
making of photocopies or other reproductions of copyright material. Under certain 
conditions specified in the law, libraries and archives are authorized to furnish a 
photocopy or other reproduction. One of these specified conditions is that the photocopy 
or reproduction not be "used for any purposes other than private study, scholarship, or 
research." If a user makes a request for, or later uses, a photocopy or reproduction for 
purposes in excess of "fair use," that user may be liable for copyright infringement. 



9 

Distribution of 
Practice 


The Traditional Case for DP Superiority 

Task Acquisition 
Task Retention 

DP Superiority in the Long Run 
DP and Single-Item Retention 

Continuous Paired-Associate Learning 
The Spacing Effect 
Attenuation of Attention Hypothesis 
Spacing of Test Trials 


Introduction/overview 

We would like to ask you a question about the following situation: 

Student A and Student B are enrolled in the same class. An assign¬ 
ment is given to the class and it is announced that an examination 
(test of retention) will be given on the assignment later. Student A 
and Student B spend an identical amount of time preparing for the 
exam; however, Student A “distributes ” her study time among sev¬ 
eral practice periods while Student B "masses” his study in one 
concentrated effort. Both students finish their preparation at the 
same time. 

The question, of course, is who will do better on the examination, Student 
A or Student B. Which is better, distributed practice or massed practice of 
to-be-remembered material? 


175 



176 


CHAPTER 9 


DISTRIBUTION OF PRACTICE 


177 


| 
; 

The problem is one of the oldest to be investigated by psychologists i n 
the held of learning and memory (see McGeoch & Irion, 1952). As you can 
readily see, there are numerous areas in which an understanding of practice 
effects may have application. Yet many questions regarding the most effective 
distribution of study time are still unanswered and recommendations given 
by educators and psychologists regarding most efficient practice are some¬ 
times in error or made without full knowledge of the limitations of practice 
effects. Perhaps we can indicate something of the complexity of this issue 
by looking at your answer to the question just presented. 

Most people are likely to say Student A will do better on the exami¬ 
nation. After all, what student hasn’t heard about the "evils” of cramming? 
Consider one such indictment against massed study, given by the eminent 
19th-century psychologist William James (1890, p. 663), who had much to 
say about memory processes that still rings true today. 

The reason why cramming is such a bad mode of study is now 
made clear. I mean by cramming that way of preparing for exam¬ 
inations by committing “ points” to memory during a few hours or 
days of intense application immediately preceding the final ordeal, 
little or no work having been performed during the previous 
course of the term. Things learned thus in a few hours, on one 
occasion, for one purpose, cannot possibly have formed many 
associations with other things in the mind. Their brain-processes 
are led into by few paths, and are relatively little liable to be awak¬ 
ened again. Speedy oblivion is the almost inevitable fate of all that 
is committed to memory in this simple way. Whereas, on the con¬ 
trary, the same materials taken in gradually, day after day, recurring 
in different contexts, considered in various relations, associated 
with other external incidents, and repeatedly reflected on, grow 
into such a system, form such connections with the rest of the 
mind’s fabric, lie open to so many paths of approach, that they 
remain permanent possessions. 


If one wishes to use an efficiency' measure for learning, it would 
be very inefficient to learn by DP; the subject would be much 
further ahead to learn by MP if total time to learn (including the 
rest intervals in DP) is the criterion. Even under the most favorable 
conditions for facilitation by DP, one could not recommend its use 
in an applied setting where verbal materials are to be mastered. 

Can this be right? Is there really no difference to speak of when massed and 
distributed practice are compared? What about cramming? Was William 
James wrong? Would Student B do as well as Student A? 

These two psychologists’ conclusions are not really at odds. Whether 
massed or distributed practice is a superior mode of study depends on 
certain critical factors, which by' the way were not identified in the situation 
described for you. The best response to our question would have been to 
ask for more information. What was the nature of the assignment? How long 
was the interval between the end of practice and the retention test or be¬ 
tween periods of practice? 

In this chapter we will first examine how psychologists investigated the 
MP-DP issue using traditional verbal learning tasks such as serial learning 
and paired-associate learning. These studies were overshadowed in the early 
1960s by the discovery of MP-DP differences of another sort. Whereas the 
MP-DP differences found in paired-associate learning resulted from manip¬ 
ulating the interval between presentations of lists of to-be-remembered 
items, more recent MP-DP phenomena have resulted from controlling the 
interval between presentations of single items. The nature and extent of MP- 
DP differences found when retention of individual items is investigated has 
led to a theoretical controversy that is far from settled. We will review briefly 
this research and the controversy surrounding it. Finally, we will introduce 
you to another MP-DP phenomenon where the distribution of test trials, not 
study trials, is of interest. Through it all you will gain important information 
about how to organize practice to produce optimal learning and retention. 


Apparently Student B doesn’t have a chance. His memory for the assignment 
is destined to “speedy oblivion." But before we rush to that conclusion, let’s 
take a second opinion, one reached by a well-known experimental psy¬ 
chologist who spent many years studying distribution of practice. (To make 
things simpler, we will occasionally refer to this issue as the MP-DP problem, 
meaning that massed practice, MP, and distributed practice, DP, of to-be- 
remembered material have been compared.) In fact, Underwood and his 
students at Northwestern University conducted research on the MP-DP issue 
for so many years that he published an important article in 1961 under the 
title, “Ten Years of Massed Practice on Distributed Practice.” After all that 
practice, Underwood (1961, p. 230) reached the following conclusion: 


TOE TRADITIONAL CASE FOR DP SUPERIORITY 

The idea that distributed practice is better than massed practice has been 
around for some time. Based on his study of nonsense syllable lists, Ebbinghaus 
(1885/1964, p. 89) suggested that "with any considerable number of repetitions 
a suitable distribution of them over a space of time is decidedly more advan¬ 
tageous than the massing of them at a single time.” In his influential textbook, 
McGeoch (1942, p. 119) stated, “The generalization that some form of positive 
distribution yields faster learning than does massed practice holds over so wide 
a range of conditions that it stands as one of our most general conclusions.” 
Although the results of many experiments, ranging from maze learning with 



178 CHAPTER 9 

rats to inverted alphabet writing with humans, appeared to support this state¬ 
ment, the magnitude of MP-DP differences frequently varied from experiment 
to experiment. In fact, some experiments showed no facilitation for distributed 
practice, and in some cases massed practice was found to be superior in task 
acquisition. In the midst of these discordant findings it was not surprising that 
researchers found it impossible to agree on a single explanation for MP-DP 
differences. In a later edition of his textbook, McGeoch and Irion (1952) pro¬ 
posed no fewer than eight theories to account for differences in learning and 
retention as a function of distribution of practice. 

Task acquisition 

Whether distributed practice is superior to massed practice in the acqui¬ 
sition of a task is now known to depend chiefly on the nature of the task. 
Learning of a perceptual-motor skill is generally more efficient when practice 
is distributed than when it is massed. An early experiment by Lorge (1930, 
reviewed by McGeoch & Irion, 1952) illustrated the dramatic differences ob¬ 
served in the rate of skill learning when practice schedules were contrasted. 
Subjects were given 20 practice trials on a mirror drawing task. Mirror drawing 
requires that subjects draw a picture or geometric form by observing their 
progress in the reversed image of the mirror. Not an easy task until you get the 
hang of it. When practice was massed, one practice trial immediately followed 
another. For distributed practice the trials were spaced at either 1-minute or 24- 
hour intervals. Figure 9-1 shows learning as a function of practice conditions. 
Performance was clearly superior under conditions of distributed practice, and, 
interestingly enough, there was little difference between a 1-minute or a 24- 
hour interval. Similar practice effects are commonly found when a perceptual- 
motor skill is learned, and experiments similar to Lorge’s have become standard 
laboratory demonstrations in experimental psychology courses. The recom¬ 
mended mode of practice for perceptual-motor learning, whether it be mirror 
drawing, bicycle riding, or gymnastics, is distributed practice. 

When the task to be learned involves verbal material, there is often little" 
if any advantage found for distributed practice when compared with massed 
practice. As we just saw, this was Underwoods (1961) conclusion after he re¬ 
viewed many MP-DP studies that were performed using such traditional tasks as 
paired-associate learning. In the majority of these studies learning was consid¬ 
ered to be massed when the interval between presentations of a list of items 
was relatively short—for example, less than 8 seconds. Learning was defined as 
distributed when the interval between list presentations was greater than 15 
seconds but usually no longer than a couple of minutes. With practice defined 
in this way distributed practice yielded superior learning of paired associates 
only under a limited set of conditions, and even then the differences between 


DISTRIBUTION OF PRACTICE 1 79 


160 

1 

150 

K 

140 

M 

130 

\ 

120 

110 


100 

90 

— \ \ 

U 

O 

% \ 

& 80 

- \\ 

70 

r 

/ 

( 

// 
/ / 
.* 

i i 

60 

50 


40 


30 

— - Massed Practice 

20 

- 1-Minute Intervals 

.1-Day Intervals 

10 

1 1 1 II II 1 U 1 1 1 1 1 II 1 1 


2 4 6 8 10 12 14 16 18 20 


Trials 

Figure 9-1. Acquisition of a mirror drawing task by subjects given 

massed practice or 

distributed practice (either 1 minute or 1 day be- 

tween practice periods). (From The Psychology of Human Learning, 

Second Edition, by John A. McGeoch and Arthur L. Irion. Copyright 1952 
by Longmans, Green & Co., Inc.; renewed © 1980 by Arthur L. Irion. 
Reprinted by permission of Longman Inc., New York.) 


practice schedules were small. For example, DP schedules produced faster 
learning than MP schedules only when the response terms in the paired-asso¬ 
ciate list were likely to interfere with learning, as when a high degree of physical 
similarity was present among nonsense-syllable responses (CAQ, CAK, KAQ). 
The explanation for DP superiority in these cases was that an interval between 
trials allowed for more effective extinction of error tendencies than was possible 
under massed practice. With this information you may be questioning the value 
of distributed practice when verbal materials, for instance, psychology facts, are 
to be mastered. However, up to this point we have been discussing the acqui¬ 
sition of a task. Retention is another story. 





180 


CHAPTER 9 


DISTRIBUTION OF PRACTICE 


Task retention 

Retention is generally greater after distributed practice than massed prac¬ 
tice if the intervals between practice periods are relatively long—for example, j 
24 hours (Cain & Willey, 1939; Keppel, 1964, 1967). When distributed practice 
intervals are relatively short, no more than a few minutes, the magnitude of 
retention effects as a function of practice schedule is generally small and, as we 
saw for verbal learning, occur only under a limited set of circumstances (see 
Underwood & Schulz, 1961). 

Keppel (1964, 1967) designed two experiments to assess the effect of dis¬ 
tributed practice on long-term retention of a list of paired associates when the 
interval between practice was relatively long. In both experiments the critical 
list consisted of paired nonsense syllables and adjectives. Subjects were asked 
to learn a relationship between the items so that when presented with the 
syllable, they could provide the adjective. Subjects studied the critical list for 
eight trials. However, some subjects studied the list with the eight trials each 
separated by only 4 seconds (MP); others had every two trials separated by 24- 
hour intervals (DP). 

The two experiments differed according to what was learned prior to the 
critical list. In the first experiment (Keppel, 1964), the critical list was actually 
the fourth list subjects learned, the previous three lists having been learned 
under MP schedules and all having had the same syllables but different adjec¬ 
tives as the critical list. As you may recall from Chapter 6, this situation can be 
described as A-B, A-C, A-D, A-E. Significant proactive inhibition should be pres¬ 
ent when retention of the critical list (A-E) is tested. In the second experiment 
(Keppel, 1967), subjects learned only the critical list. Retention of the eight-pair 
list was tested in both experiments after 1 day or after 8 days. 

Results of the multiple-list experiment and the single-list experiment are 
shown in Figure 9-2. Memory for the adjective terms of the critical list was 
greater at both 1-day and 8-day retention intervals when learning was distrib¬ 
uted. The findings are particularly impressive in the multiple-list experiment, 
where substantial proactive inhibition due to learning the previous three' lists 
was expected. After 1 day, in this case, there was massive retention loss for those 
subjects who learned under massed conditions. When retention followed dis¬ 
tributed practice, subjects experienced no such retention loss. 

Keppel’s (1967) explanation for the dramatic differences in retention seen 
as a function of practice conditions turns out to be different from that provided 
by William James. James argued that distributed practice allows more different 
associations to be developed between the material and what we already know. 
Increasing the number of associations to an item produced, in James s words, 
more “paths of approach,” or as we would say today, more retrieval cues. Modern 
theorists refer to this type of explanation as a variable encoding hypothesis 


181 



Retention Interval in Days 


Figure 9-2. Mean retention loss over 1 day and 8 days as a function 
of massed (MP) and distributed (DP) learning when a single list was 
learned or when the critical list was the last learned in a multiple-list 
experiment. (From "A Reconsideration of the Extinction-Recovery The¬ 
ory, ” by G. A. Keppel. In Journal of Verbal Learning and Verbal Behavior, 
1967, 6, 476-486. Copyright 1967 by Academic Press, Inc. Reprinted by 
permission.) 


(Martin, 1968; Melton, 1970). An encoding variability hypothesis assumes that 
a to-be-remembered item is encoded differently from one presentation to an¬ 
other. It is generally assumed that the degree of variable encoding is positively 
related to the interval between presentations: the greater the interval, the more 
likely an item is to be encoded differently on each of its presentations. 

Keppel’s explanation for superior retention following distributed practice 
emphasized the quality, not the quantity, of associations that might develop 
during distributed learning. He suggested that during acquisition a subject tries 
to find some way to link the verbal items of a pair together, perhaps using some 
word or idea to hook up” the two items. That is, some associations that are 
formed between members of a pair during learning will be better than others. 
The better the associative link, the better the chance the item will be remem¬ 
bered at some later time. (For example, see Hasher & Johnson, 1975, and 
Chapter 7.) Now consider what possibly happens when learning is distributed 
and intervals between practice sessions are quite long. As subjects return for 
additional practice, they discover that some of their associations formed during 
the previous practice session have been forgotten and that they must now find 
new ways to hook up the verbal items. Therefore under distributed learning 



182 


183 


CHAPTER 9 

the subjects have the opportunity to “test" their associations and continually to 
replace weak associations with newer and more stable ones. At the end of 
acquisition, it is assumed that the associations subjects developed under dis¬ 
tributed practice will be qualitatively better than those developed under massed 
study; hence, retention should be greater after distributed learning. Interest¬ 
ingly, there have been few attempts to distinguish between Keppel’s explanation 
for DP superiority and that based on a variable encoding hypothesis (although 
see Maki & Hasher, 1975). 

DP superiority in the long run 

By now you should have sufficient information to answer the question we 
presented at the beginning of this chapter. Assuming that verbal material (for 
example, information from a textbook on psychology) is to be learned, then we 
cannot expect large differences in the acquisition rate of the material as a 
function of practice conditions. Student A and Student B could very well know 
the same amount of material at the end of their respective study periods. In fact 
if the retention test was given immediately after study, there might be little 
difference in their performance. This is something students probably discover 
for themselves and no doubt creates the dangerous impression that massed 
study is as “good” as distributed study. 

However, if there was a significant length of time between the end of study 
and the retention test and if the intervals between periods of distributed study 
were relatively long (24 hours, say), then we can be fairly confident that Student 
A will succeed and Student B will not. Further, because significant amounts of 
proactive inhibition will be present for most students (from all those previous 
courses and tests), when long-term retention is the goal, the use of distributed 
practice is particularly crucial. In short, a premedical student might perform 
adequately on a series of tests in various courses by massing study right up to 
the time of the examination. But should a comprehensive exam be required at 
some later time (for instance, the MCAT), this student will undoubtedly need * 
to spend more time on review than a student who consistently organized study 
in a distributed fashion. 

DP AND SINGLE-ITEM RETENTION 

On several occasions we have referred to an important experimental pro¬ 
cedure known as the Brown-Peterson paradigm. In Chapter 3 we saw that the 
introduction of this paradigm in the late 1950s began a controversy that has 
lasted for more than two decades. Is forgetting from primary memory due to 
decay, interference, or displacement? In Chapter 7 we saw that the same para¬ 
digm has been used to investigate multiple encoding in the context of release • 


DISTRIBUTION OF PRACTICE 

from proactive inhibition (PI). The introduction of the Brown-Peterson task 
signaled an important change in how psychologists investigated memory. This 
technique, wherein an individual item is first presented and then retention is 
tested after a distractor-filled interval, was an important break from more tra¬ 
ditional methodologies, such as paired-associate learning and serial learning, 
wherein lists of items are presented for study before retention is tested. The 
Brown-Peterson task helped to focus attention on the conditions surrounding 
encoding and retention of individual items. 

Continuous paired-associate learning 

Interest in the retention of single items led to yet another paradigm to 
investigate memory processes, one called continuous paired-associate learning 
(Peterson, Saltzman, Hillner, &Land, 1962). In this task subjects are given single 
pairs of items to study. As is the case for more traditional paired-associate learn¬ 
ing, they are instructed to learn an association between the first and second 
members of the pair so they can produce the second member when the first 
is presented. Unlike more traditional association tasks, however, retention of a 
single pair is tested after a short interval filled with the presentation and test of 
other paired associates. The continuous paired-associate learning task affords 
investigators an opportunity to examine the retention of individual associations 
over very brief intervals, as is done in the Brown-Peterson task. It also has the 
advantage of allowing data to be quickly collected on many pairs at several 
different retention intervals because study and test trials are intertwined con¬ 
tinually. Table 9-1 shows a typical study and test series when distribution of 
practice is investigated in this type of task. 

In several experiments Peterson and his associates (Peterson et al., 1962; 
Peterson, Wampler, Kirkpatrick, & Saltzman, 1963) used the continuous paired- 
associate task to investigate retention differences as a function of the spacing of 
two repetitions of a single paired associate. (Rather than talk about MP-DP 
differences when retention of single items is tested, researchers prefer to de¬ 
scribe study conditions in terms of the spacing between individual items. A zero 
spacing—that is, no intervening items—is equivalent to massed study; intervals 
of spacing greater than zero are considered distributed study. Superior reten¬ 
tion following distributed study of single items relative to massed study is fre¬ 
quently called a spacing effect.) In one such experiment (Peterson et al., 1963, 
Experiment III) the pairs to be remembered were familiar single-syllable words. 
They were presented once or twice before being tested. Half the twice-pre¬ 
sented pairs were presented at zero spacing; for the other half there were four 
items intervening between repetitions. Both the massed and distributed items 
were tested for retention after 4 and 16 seconds. At the long interval (16 sec¬ 
onds), retention of single pairs was greater following spaced than massed pre- 


184 


CHAPTER 9 


TABLE 9-1. Example of a Continuous Paired-Associate 
Learning Procedure 


Trial number 


Study (S) or 
test (T) trial 


Type of item: 
single, MP, or DP 

DP (1) 
Single 
MP (1) 

MP (2) 
Single 
DP (2) 
Single 
DP (1) 

MP 

Single 

DP 


dog-car 

cup-desk 

pen-wood 

pen-wood 

rat-book 

dog-car 

cup-? 

brick-tree 

pen-? 

rat-? 

dog-? 

floor-card 

brick-tree 


Note: To-be-remembered items are presented one time or two times in a massed or distributed 
fashion The number of items intervening between study and test trials determines the retention 
interval. For example, if each item is presented for 4 seconds, then four intervening items produce 
a retention interval of 16 seconds. 


sentations. The finding is something of a paradox because greater forgetting of 
the first presentation would be expected with spaced than with massed repe¬ 
titions. However, two spaced presentations produced better retention than two 
massed presentations. 

Although continuous paired-associate learning sounds somewhat esoteric, 
it is actually similar to a task you have likely employed in your attempts to learn 
a list of facts. Students at all grade levels use flash cards to help their study. The 
to-be-remembered item is put on one or more cards and then a query card is 
constructed to permit a test of memory without revealing the complete answer. 
Many different study and query cards are assembled in a deck, and practice 
involves continuously presenting and testing the to-be-remembered items. The 
results of the Peterson experiments make it clear that when two study cards are 
used, their presentations should be spaced in the list. Later in this chapter we 
will have something to say about how to position a query card for optimal 
retention. 

The spacing effect 

Researchers quickly found that the spacing effect obtained with continuous 
paired-associate learning was a remarkably robust phenomenon. It is produced 
easily in many different memory tasks and with many different kinds of mate- 


DISTRIBUTION OF PRACTICE 


185 


rials, from nonsense syllables to words to sentences and pictures. An experiment 
by Underwood (1970) reveals the magnitude of the spacing effect when the 
free-recall method is used. He tested children between the ages of 9 and 14 
years for their memory of 42 nouns. He presented words on a tape recorder at 
the rate of one per second. Some of the words appeared only one time; others 
appeared two, three, or four times. Half the repeated items were presented in 
a massed fashion and the others were presented in a distributed fashion. The 
spacing between repetitions of distributed words was nonsystematic—that is, 
there was no set interval between item repetitions. Children’s recall of the words 
was measured immediately after the last word was presented. The results of this 
experiment are summarized in Figure 9-3. 

Words presented in a spaced fashion were recalled better than massed 
items at each frequency level. The size of the spacing effect (the difference in 
retention following zero and spaced study) increased with the frequency of 
repetitions. In fact, recall of a word presented two times under distributed study 
was superior to that of a word presented four times under massed study. Results 
similar to these are found with adult subjects when free recall is tested. 

The spacing of repetitions in free recall 

Purpose 

To observe the differential effect on retention of repeating items in either a 
massed or distributed fashion in a list presented for free recall. 

Materials 

The free-recall list contains 28 common, two-syllable nouns (see Table C in 
the Appendix). Four of the nouns are randomly selected to serve as a primacy 
“buffer” and the remaining 24 are critical items. 

The independent variables are the number of presentations of an item (1 
or 2) and the spacing between presentations (zero or 3-5 intervening items) 
in the list. The procedure for constructing the list can best be explained by 
considering the 24 items as representing four blocks of 6 items. In each block, 

2 of the 6 items are presented only one time (IP), 2 are presented twice in 
a massed fashion (MP, or zero spacing), and 2 are presented twice in a dis¬ 
tributed manner (DP). Therefore each block requires 10 list positions. Spac¬ 
ing of the DP items is nonsystematic, with either 3, 4 ; or 5 items intervening 
between the two presentations of an item. Two items of the same type (either 
IP, MP, or DP) should not appear consecutively in a block. The four buffer 
items are presented first. Two of these buffer items are presented once and 
2 are presented twice, 1 MP and 1 DP. The final list therefore has 46 positions. 





186 


DISTRIBUTION OF PRACTICE 


187 


CHAPTER 9 


50 


40 

/ 


/ 

& 

/ 

c 30 

~ / 

<u 

(J 

Ih 

/ J> MP 

20 


10 

- 


1 1_1_1- 

12 3 4 


Frequency 

Figure 9-3. Percentage of free recall of words by children ages 9-14 

years as a function of frequency of repetitions and massed or distributed 

presentations. (From “A Breakdown of the Total Time Law in Free-Re- 

call Learning, ” by B. J. Underwood. In Journal of Verbal Learning and 

Verbal Behavior, 1970, 9, 573-580. Copyright 1970 by Academic Press, 

Inc. Reprinted by permission.) 


measures (within-subjects) t test to compare the difference in mean recall 
between MP and DP items. It will also be of interest to see how different 
recall of MP items is relative to recall of IP items. 


Procedure 

Free-recall instructions are first read to the subjects. Items are then presented 
at the rate of one every 3 seconds. After hearing the last item, subjects are 
asked to count backward by threes for 30 seconds. (If subjects are tested in 
a group, the backward number counting should be performed silently.) Three 
minutes are then given for free recall. 

Instructions to Subjects 

This is a free-recall experiment. I am going to read a list of words for you to ^ 
remember. Please listen to each word carefully because after I finish reading 
them, I want you to write down as many as you can remember. You may 
remember the words in any order you wish. Some of the words will be 
repeated in the list. Do not let this disturb you. Simply try to remember as 
many of the words in the list as possible. Are there any questions? (After the 
last word is read, subjects are asked to count backward by threes starting from 
a three-digit number the experimenter provides. Following 30 seconds of 
backward number counting, 3 minutes are given for free recall.) 

Summary and Analysis 

The number of IP, MP, and DP items each subject recalls is recorded. There 
are eight of these three item types in the critical set. Buffer items are not 
scored. The effect of spacing can be tested statistically by using a repeated- 


Recommended Minimum Number of Subjects 

Total of 16. 

The procedure for this experiment is similar to that of several published experiments that 
have investigated spacing effects. 


For those who enjoy empirical puzzles, the spacing effect has turned out 
to be a real delight. We will briefly consider one major approach to solving this 
puzzle. 

Attenuation of attention hypothesis 

Of the many theories offered to account for the spacing effect, the one 
receiving the most empirical support has been the attenuation of attention 
hypothesis (see Hintzman, 1974, 1976). This explanation was apparently first 
proposed by Peterson and his colleagues to explain the spacing effect in con¬ 
tinuous paired-associate learning (Peterson et al., 1963). The attenuation of 
attention hypothesis makes the seemingly plausible assertion that the learner 
pays less attention to subsequent presentations of an item when repetitions 
occur close to its initial presentation (zero spacing) than when repetitions are 
spaced. If attention is attenuated, there is less processing. Therefore massed 
and spaced items will not be processed to the same degree. As reasonable as 
this hypothesis seems, not all the evidence has been favorable toward it. 

Major support for the attention hypothesis has been provided by experi¬ 
ments designed to measure subjects’ degree of attention for repeated occur¬ 
rences of the same item. Shaughnessy, Zimmerman, and Underwood (1972), for 
example, tested subjects for free recall of a list of words containing both massed 
and spaced presentations of repeated items. What made this spacing experiment 
different from others was that subjects were allowed to pace their own study of 
the to-be-remembered words. By pressing a button that activated a slide pro¬ 
jector, each subject could control the amount of time spent viewing individual 
words in the list. The study times were automatically recorded for all items in 
the list. A typical spacing effect was obtained for retention of the words. Also, 
the results based on study times for individual items showed that when left on 
their own, subjects spent less time studying massed repetitions of an item than 
distributed repetitions. 

Another experiment supporting the attention hypothesis (Johnson & Uhl, 
1976) required subjects to perform a simple reaction time task at the same time 
they were attempting to learn words from a list containing both massed and 







188 CHAPTER 9 

spaced repetitions. A weak auditory signal was presented to subjects in their left 
ear while to-be-remembered words were presented in their right ear. The sub¬ 
jects were instructed to press a button whenever they heard the tone but other¬ 
wise to study the items for a memory test. This unique experimental ar range- 
ment was based on the logic that the more processing a subject gave to 
committing an item to memory, the less processing capacity would be “left over” 
to process the auditory signal. In other words, the faster the reaction time, the 
less effort the subject was presumably spending to memorize the list items. The 
important comparison, of course, involved reaction times to the auditory signal 
when it occurred during massed and spaced repetition. Figure 9-4 shows the 
reaction time results for once-presented items as well as for twice-presented 
items under the two conditions of spacing. Reaction times were clearly faster, 
and processing of list items presumably less, when the signal was heard during 
a massed presentation than when the signal was heard during a spaced presen¬ 
tation of an item. 

Although these studies have provided results in line with the attention 
hypothesis, critics have argued that demonstrations of this sort are based on 



Figure 9-4. Mean reaction time scores for words appearing one time 
(IP) and for words repeated four times in either a massed or distributed 
fashion. Reaction times were obtained for each repetition position in the 
massed or distributed series. (From ‘‘The Contributions of Encoding 
Effort and Variability to the Spacing Effect on Free Recall," by W. A. 
Johnston andC. N. Uhl. In Journal of Experimental Psychology: Human 
Learning and Memory, 1976, 2, 153-160. Copyright 1976 by the Amer¬ 
ican Psychological Association. Reprinted by permission.) 


DISTRIBUTION OF PRACTICE 189 

“obtrusive” measures that may cause the subjects to behave in a way not typical 
of their behavior in standard learning conditions (Hintzman & Stem, 1977). 
Another criticism of the attention hypothesis arises from the fact that attempts 
to control subjects’ processing of massed repetitions have not been particularly 
successful in altering the spacing effect (for example, Hintzman, Summers, Eki, 
& Moore, 1975; Shaughnessy, 1976). A further argument against the attenuation 
of attention hypothesis is that it does not explain “why” a subject turns off 
processing under massed repetitions (Hintzman, 1974). Some of these criti¬ 
cisms are considered in the next few paragraphs. 

Although several different approaches have been taken in an attempt to 
alter subjects’ attention to massed presentations of an item, perhaps the most 
direct route was taken by Hintzman and his co-workers at the University of 
Oregon (Hintzman et al., 1975). These researchers used something that tends 
to get our attention very quickly—namely, money. Subjects viewed a lengthy list 
of pictures, which were presented at a 3-second rate. They were told that some 
of the pictures would be accompanied by a tone and that they should make a 
special effort to remember these pictures. They were also told that they would 
be paid for every picture they remembered and that they would receive four 
times as much for remembering pictures with the tone as for remembering 
pictures without tones ($.04 versus $.01). The tone, as you probably have 
guessed by now, appeared during the second presentation of a massed or 
spaced item repetition. Actually, half the repeated items were presented with 
the tone and half were not. The attention hypothesis predicts the tone would 
lessen the spacing effect. That is, the difference in retention for massed and 
spaced items should be less when subjects are paid to attend to the second 
presentation of repeated items. This did not happen. The overall retention level 
as measured by frequency judgments for repeated items was greater for pictures 
with tones than for pictures without tones, but the spacing effect was exactly 
the same for both types of presentations. Assuming that subjects increased their 
attention to the items when the tone was sounded, this increased attention did 
not seem to change the spacing effect. Of course, if subjects increased their 
attention equally to both massed and distributed items when they heard the 
tone, a spacing effect might still be expected because attention to massed pre¬ 
sentations was perhaps not yet equivalent to that for spaced items. 

In defense of the attention hypothesis, there are several reasons subjects 
might turn off processing when an item is presented in a massed fashion. One 
is that a subject, having just worked on an item, may treat its repeated occurrence 
as time better spent on processing other items in the list (see Waugh, 1970). 
Another possibility is that having just seen an item, subjects are mistakenly led 
to think they already know it well enough to remember it and hence do not 
need to devote further processing to it (Shaughnessy, 1976; Zechmeister & 




190 CHAPTER 9 

Shaughnessy, 1980). Motivational inducements to pay attention to massed rep¬ 
etitions (for instance, offering extra money) might not be expected to work if 
subjects perceive they already have the job done. A recent experiment reported 
results bearing on this explanation of attenuated processing of MP items. 

Zechmeister and Shaughnessy (1980) asked subjects to study a list of words 
in preparation for a free-recall test. For some of the words, which appeared 
once or twice in either a massed or spaced fashion, the subjects were asked to 
“predict” how well they thought they knew the item they had just studied. 
Subjects were given a rating scale on which to estimate their confidence that 
they would remember a particular item on the upcoming memory test. Ratings 
were made immediately following a once-presented item or after the second 
occurrence of a twice-presented item. (People’s ability to judge what they know 
and don’t know is an interesting topic in itself and will be treated in Chapter 
11.) These investigators found that subjects accurately predicted that items ap¬ 
pearing twice in the list would be remembered better than those appearing 
once. However, subjects misjudged their ability to remember massed and dis¬ 
tributed items. Although recall was significantly higher for distributed than for 
massed items (the spacing effect was found), average predictions of recall for 
these two kinds of items did not differ significantly and in fact were slightly 
higher for massed than distributed items. It is possible that subjects’ confidence 
in an item’s memory strength is somehow inflated by the massing of repetitions 
or, conversely, that repetition of an item after an interval teaches subjects that 
the item has dropped in strength. For example, when a to-be-remembered item 
is presented and then immediately repeated (zero spacing), subjects may turn 
off processing because they feel they have successfully solved the problems of 
getting the item into memory. 

Jacoby (1978) at the University of Toronto showed that there is a problem¬ 
solving aspect to memory encoding in an interesting series of studies. He sug¬ 
gested that we look at the task of memorizing a list of words as being similar 
to that of solving a series of problems. The problem from the learner’s point of 
view is to find a way to put each of the items into memory. You will recall from 
our earlier discussion of MP-DP differences associated with list learning that 
Keppel’s explanation of the MP-DP effect was based on the idea that subjects 
are continually trying to solve the problem of hooking up pairs of items. Solving 
the problem when a single item is studied may involve forming an image, 
relating the word to other words in the list, taking advantage of some unusual 
association suggested by an item, or using whatever seems to make the item 
memorable. We shall review the results of one of Jacoby’s interesting experi¬ 
ments in order to examine this problem-solving analysis of memory processing. 

The task Jacoby gave his subjects was similar to solving a crossword puzzle. 
He presented two words simultaneously to the subject, one word serving as a 
cue for the other, which had some letters missing (for example, foot, s — e). The 


DISTRIBUTION OF PRACTICE 191 

subject’s task was to identify the partially spelled word (shoe, in this example). 
He presented a number of such problems to subjects in a long list. For some 
of the problems the solution appeared in the list prior to the problem itself. 
That is, for our previous example the subjects would be presented with foot 
shoe some time before seeing the problem foot s — e. Of interest to Jacoby was 
the spacing between the solution and the problem. For some items the problem 
was made particularly easy in that the solution appeared immediately (no in¬ 
tervening items) before the problem. For other items there were 20 intervening 
items between the solution and the problem. The list also contained repetitions 
of solutions (no problem), which were either massed or spaced. 

Jacoby told his subjects he was interested in how long it took them to solve 
problems similar to those of a crossword puzzle. He informed subjects that 
their reaction times to solve the problems would be recorded. As soon as they 
knew the answer to a problem, they were to push a button and say aloud the 
word that fit the solution. Because some of the word pairs contained two intact 
words (that is, the solution was presented), subjects were also told that their 
reaction time to “read” the intact pairs (foot shoe) would be used to evaluate 
their times for problem pairs (foot s — e). Therefore when both words were 
intact, the subjects were to push a button and read the words aloud. Reaction 
times were not actually recorded and, unannounced to the subjects, a memory 
test followed the presentation of the list. For the memory test the left-hand 
member of each pair was presented and subjects were asked to provide the 
right-hand member. This type of test is called cued recall. 

The results of Jacoby’s rather unusual memory experiment are shown in 
Figure 9-5. These results demonstrate that constructing a solution to a problem 
produces greater retention than merely “reading” it (see also Slamecka & Graf, 
1978). Further, the advantage for constructing an answer depends on the spacing 
between the solution and the problem. When the problem immediately fol¬ 
lowed the solution, the results were approximately the same as if two solutions 
had been presented—that is, as if subjects had simply read the solution twice. 
When 20 items intervened between a problem and its solution, there was a 
sizable effect due to spacing. This spacing effect was much greater than when 
the second occurrence of an item was a solution repeated. Note that retention 
of a once-presented problem is greater than twice-presented solutions whether 
the solutions were massed or spaced. These results tell us that constructing an 
answer enhances retention, whereas mere repetition of a solution does not. In 
the next section we draw a parallel between construction of a solution and 
performance on a test trial. For the present let us consider the application of 
these results to the spacing effects as found in more standard memory tasks. 

Attenuation of attention may result from subjects being more likely to 
assume that the problem of getting an item into memory has been solved when 
it is immediately repeated than when a repetition is delayed. The consequence 



192 CHAPTER 9 



o— — <i RC 

.85 

RR 


o C 


» R 

.75 

— 

o 

<L> 

/ 

OS 

73 ‘65 

<JJ 

P 

/ 

u 

/ 

U~i 

i 55 

/ 0 

£ 

os 

/ 

S -45 

J . 

O, 


.35 

- 

.25 

I l T 

Immediate Spaced Once 


Presented 

Figure 9-5. Probability of cued recall as a function of whether a prob- 

lem was merely read (R) or an answer was constructed (C) and when a 

problem was read twice (RR) or first read and then an answer construct- 

ed (RC) as a function of spacing. (From “On Interpreting the Effects of 

Repetition: Solving a Problem versus Remembering a Solution, ” by L. L. 

Jacoby. In Journal of Verbal Learning and Verbal Behavior, 1978, 17, 

649-667. Copyright 1978 by Academic Press, Inc. Reprinted by permis- 

sion.) 


would be making less of a response to a massed repetition of an item than to 
a spaced repetition. As Jacoby (1978, p. 661) commented, 

Presentation of an event whose solution or encoding can be easily remem- 
bered does not give rise to an orienting response or heavily involve con¬ 
sciousness; presentation of such an event will also have little impact on later ' 
retention. The necessity of construction, in contrast, gives rise to an orienting 
response, involves consciousness to a greater degree, and produces a sub¬ 
stantial effect on later retention performance. The spacing of repetitions has 
its effect by determining whether a solution or encoding can be remembered 
or must be constructed. 

The attenuation of attention hypothesis is by no means the only explana¬ 
tion offered to account for the effect of spacing on an item’s retention. Another 
major hypothesis is the variable encoding hypothesis, which William James 
invoked to explain the effect of distributed practice on long-term retention of 


DISTRIBUTION OF PRACTICE 193 

verbal material (see also Melton, 1970). Encoding variability might occur within 
a list of items if it is assumed that relative to massed presentations, distributed 
presentations of an item lead it to be rehearsed within different groups of items 
(contexts) or encoded with different associations. However, tests of this hy¬ 
pothesis as it applies to the retention of single items have not been particularly 
supportive (Hintzman, 1974; Jacoby, 1978; Maki & Hasher, 1975). Yet more than 
one explanation of the spacing effect may be required. That is, although some 
researchers have assumed the reason for the spacing effect is the same in the 
many different situations in which it has been observed (for example, Hintzman, 
1974), others have questioned whether all spacing effects appear for the same 
reason (Underwood, Kapelak, & Malmi, 1976). For example, it seems likely that 
some form of attenuation of attention can account for part of the spacing effect 
but not all (Zimmerman, 1975). 

The mapping out of the various contributions of the available theories to 
the spacing effect will no doubt occupy researchers for some time. (You have 
to enjoy puzzles to be in this business.) At the same time, we must not lose sight 
of the fact that research on the spacing effect has revealed powerful effects on 
retention that can provide practical suggestions in many areas in which memory 
is tested. 

The spacing effect is not limited to tasks employing item-by-item presen¬ 
tation of to-be-remembered words. Hall, Smith, Wegener, and Underwood (in 
press) presented college students with words for free recall, either item by item 
or in a complete list. In the complete-list condition, words were typed in a 
single column, double spaced, in the middle of a sheet of paper. In the item- 
by-item condition, a slide projector was used to present the words. In both 
conditions some of the words appeared more than once, and their occurrences 
were either massed or spaced. In the complete-list condition, the repeated items 
occurred either next to one another or with other items intervening in the 
column. 

Retention was better for spaced than for massed items in both presentation 
methods, although the difference was smaller following complete-list presen¬ 
tation. Further, even though total study time was the same for the two presen¬ 
tation conditions, recall was significantly greater after complete-list presentation 
than after the usual laboratory procedure of presenting items one at a time. To 
explain these findings, the researchers suggested that during complete-list pre¬ 
sentation, subjects may go through the list more than one time. Recycling study 
would have the effect of giving spaced presentation to all items in the list and 
would therefore raise the overall level of recall. Also, items that were massed 
would now be “spaced,” which would reduce the overall spacing effect of com¬ 
plete-list presentation. As we saw, this is just what happened. Therefore, not 
only is the spacing effect found across a wide variety of tasks and procedures, 






194 CHAPTER 9 

but this interesting phenomenon may be the basis for improved retention when¬ 
ever learners study in a manner that produces the functional equivalent of 
spaced presentation. 

Spacing of test trials 

Sometime in your psychology career you likely discussed the shaping of 
behavior according to principles of operant conditioning. Perhaps you have 
even seen a “live” demonstration of shaping that involved teaching a rat to press 
a bar for food reinforcement. Shaping is the reinforcement of successive ap¬ 
proximations to a desired response. For example, to shape a rat to press a bar, 
you might first give reinforcement when the animal is standing next to the bar. 
You might give subsequent reinforcement only when the animal moves closer 
to the desired response—for example, raising a paw near the bar. You continue 
this process until the animal is performing the desired behavior. Shaping is a 
powerful learning technique, which, although often illustrated with a rat learn¬ 
ing to press a bar, nevertheless has numerous applications in situations where 
a new behavior is to be acquired. 

What behavior do we wish to acquire in a memory task? Most likely it is 
the ability to recall something after a long interval. Perhaps “shaping” could be 
accomplished by gradually lengthening the interval between study and test until 
it approaches a long interval. By using spaced repetitions of test trials rather 
than study trials, Landauer and Bjork (1978) revealed how memory behavior 
just might be shaped in this manner. There has been little research on the effect 
of spacing of test trials when single items are presented for a memory test 
(although see Whitten & Bjork, 1977). Therefore the results of this experiment 
present a new look at spacing effects. 

These researchers presented large groups of subjects the task of learning 
people’s names. Each subject was given a deck of cards on which the names of 
fictitious individuals were written. Study cards combined both the first and last > 
names of the individual; test cards contained only the first name. As in traditional 
paired-associate learning, subjects were to provide the appropriate second 
name when only the first name was given. After both names were presented 
once for study, the first name only was presented for three successive test trials. 
The major variable was the nature of the spacing between the three test cards. 
The various patterns of spacing that Landauer and Bjork (1978) used are shown 
in Figure 9-6. As you can see, several “uniform” spacings were contrasted with 
both “expanding” and “contracting” spacings. Expanding spacing would be most 
similar to shaping. If the desired behavior is long-term retention, an expanding 
series first gives the subject an opportunity to remember after a short interval, 
then after a little longer interval, and then after an even longer interval. This 


DISTRIBUTION OF PRACTICE 195 


Spacing of Tests 

Uniform Short 
0, 0,0 and 1,1,1 

Uniform Moderate 
4, 4, 4 and 5, 5, 5 

Uniform Long 

9 =£ (x, y, z) 11; average 9.3 to 10.3 
Expanding 

0, 3, 10 and 1, 4, 10 

Contracting 

10, 3, 0 and 10, 4, 1 

Figure 9-6. Types of spacing patterns used by Landauer and Bjork 
(1978). An item was presented once for study and then followed by three 
test trials. Therefore there were three intervals between tests. Numbers 
in the figure represent the number of intervening items between suc¬ 
cessive test trials for a particular item type. (From "Optimum Rehearsal 
Patterns and Name Learning, ” by T. K. Landauer and R. A. Bjork. In M. 
M. Gruneberg, P. E. Morris, and R. N. Sykes (Eds.), Practical Aspects of 
Memory. Copyright 1978 by Academic Press, Inc. (London) Ltd. Used by 
permission.) 


presumably approaches the desired response of long-term retention. In one 
expanding pattern of test trials, the first test appeared after one intervening 
item, then was repeated after four more intervening items, and finally was 
presented for the third time following ten additional items (pattern: 1, 4, 10). 

A cued-recall test of the fictitious names was given 30 minutes after pre¬ 
sentation of the list. These results are shown in Figure 9-7 in terms of the 
average spacing of the tests. For example, the two moderate uniform intervals, 
4, 4, 4 and 5, 5, 5, were combined for an average spacing of 4.50. Performance 
was measured both in terms of proportion recall (left ordinate) and percentage 
improvement over retention of a name presented one time for study with no 
subsequent test trials (P only; right ordinate). As you can see from Figure 9-7, 
memory performance was best for the expanding test series. We can assume 
that shaping worked by guaranteeing a high probability of correct response 
at each retention interval. A subject is more likely to respond correctly after an 
initial retention interval of one item (expanding) than a retention interval of 
five items (uniform). 

. 

Further research on “programmed testing” may have important conse- 





196 


CHAPTER 9 



quences for the way material is presented for optimal retention. The next time 
you make a deck of flash cards for study, you might try testing yourself according 
to an expanding series. We hope your memory will be appropriately shaped. 


Summary 

In this chapter we have focused on a problem of both theoretical and prac¬ 
tical significance. How does repetition of to-be-remembered information 
affect its retention? The question has often been investigated by comparing 
the effects of massed (MP) and distributed practice (DP). Results of early 
studies revealed that although acquisition of a verbal task (such as paired- 
associate learning) was not necessarily affected by practice conditions, re¬ 
tention of verbal material was. Specifically, retention is generally better fol¬ 
lowing DP than MP when the interval between practice periods is relatively 
long. Explanations for this effect include those that emphasize differences 
in number of encodings (the encoding variability hypothesis) and diffei - 
ences in the quality of encoding as a function of practice. 

When we look at the conditions surrounding the acquisition and reten¬ 
tion of a single item, we find that retention is substantially better when item 


DISTRIBUTION OF PRACTICE 


197 


repetitions are spaced rather than massed. A major explanation of this “spac¬ 
ing effect” is the attenuation of attention hypothesis, which suggests that 
subjects “turn off” processing of massed but not distributed repetitions of 
an item. There is support for this hypothesis, but it appears that other factors 
are involved. Encoding variability has also been offered as an explanation 
of this effect. Spacing effects are also obtained when retention is observed 
following different distributions of test trials rather than study trials. 
Although the effects are new and additional research is needed, there is 
some evidence that an expanding spacing of test trials leads to optimal 
retention. 

The theoretical controversy surrounding explanations of practice ef¬ 
fects should not detract from the many possible practical applications of¬ 
fered by our knowledge of these phenomena. Laboratory research will no 
doubt continue on this problem for some time, but an equally important 
task is to test the application of these powerful effects in “real-life ” situations. 

Recommendations for further reading 

Melton (1970) has provided an interesting historical review of the re¬ 
search on massed and distributed practice. This review appears in the same 
issue of the Journal of Verbal Learning and Verbal Behavior as do several 
other important articles dealing with this problem. More recent reviews and 
a discussion of the theoretical controversy surrounding the spacing effect are 
found in articles by Hintzman (1974, 1976). Selected aspects of the spacing 
problem are also reviewed and a new theoretical analysis of the encoding 
variability hypothesis of the spacing effect is presented in a recent article by 
Glenberg (1979). For those interested in learning more about perceptual- 
motor tasks, we recommend chapters by Bilodeau (1969) and Noble (1978). 






